Skip to main content

Last Update: 2025/3/26

Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.

Warning: If the request is still not completed after 6 minutes, the server will close the connection.

Why do we have rate limits?

Rate limits are a common practice for APIs, and they're implemented for several important reasons:

1. Protection Against Abuse or Misuse

Rate limits help protect the API from being overloaded or disrupted by malicious actors. For example, a malicious user could flood the API with requests in an attempt to overwhelm the system or cause downtime. By setting rate limits, OpenAI can prevent such harmful activities.

2. Ensuring Fair Access

Rate limits help ensure that all users have fair access to the API. Without rate limits, a single user or organization making an excessive number of requests could slow down the service for everyone else. Throttling the number of requests per user ensures that all users can access the API without significant delays.

3. Managing Infrastructure Load

Rate limits help OpenAI manage the overall load on its infrastructure. If the number of requests to the API increases too dramatically, it could strain the servers and lead to performance issues. By setting rate limits, OpenAI can ensure that the system remains stable and provides a consistent experience for all users.

Need Higher Throughput?

If you’re looking to apply for higher concurrency or need more flexibility, please reach out to us directly. We’re happy to discuss your specific requirements and assist you in accommodating your needs.